Self-attention transfer networks for speech emotion recognition
نویسندگان
چکیده
A crucial element of human–machine interaction, the automatic detection emotional states from human speech has long been regarded as a challenging task for machine learning models. One vital challenge in emotion recognition (SER) is how to learn robust and discriminative representations speech. Meanwhile, although methods have widely applied SER research, inadequate amount available annotated data become bottleneck that impedes extended application techniques (e.g., deep neural networks). To address this issue, we present method combines knowledge transfer self-attention tasks. Here, apply log-Mel spectrogram with deltas delta-deltas input. Moreover, given emotions are time-dependent, Temporal Convolutional Neural Networks (TCNs) model variations emotions. We further introduce an attention mechanism, which based on algorithm order long-term dependencies. The Self-Attention Transfer Network (SATN) our proposed approach, takes advantage autoencoders source task, then recognition, followed by transferring into SER. Evaluation built Interactive Emotional Dyadic Motion Capture (IEMOCAP) demonstrates effectiveness novel model.
منابع مشابه
Feature Transfer Learning for Speech Emotion Recognition
Speech Emotion Recognition (SER) has achieved some substantial progress in the past few decades since the dawn of emotion and speech research. In many aspects, various research efforts have been made in an attempt to achieve human-like emotion recognition performance in real-life settings. However, with the availability of speech data obtained from different devices and varied acquisition condi...
متن کاملSelf-organizing boolean networks for speech recognition
We show the application of a self-organizing Booleari network to speech recognition. The model consists of a set of two-input Boolean gates which has to implement a n-to-1 Boolean mapping through a learning-by-example procedure. The training scheme is based on an optimization process (Simulated Annealing). This approach is applied to a simple phoneme recognition task, achieving high accuracy.
متن کاملNeural Networks for Language Independent Emotion Recognition in Speech
This chapter introduces a neural network based approach for the identification of human affective state in speech signals. A group of potential features are first identified and extracted to represent the characteristics of different emotions. To reduce the dimensionality of the feature space, whilst increasing the discriminatory power of the features, a systematic feature selection approach wh...
متن کاملProgressive Neural Networks for Transfer Learning in Emotion Recognition
Many paralinguistic tasks are closely related and thus representations learned in one domain can be leveraged for another. In this paper, we investigate how knowledge can be transferred between three paralinguistic tasks: speaker, emotion, and gender recognition. Further, we extend this problem to cross-dataset tasks, asking how knowledge captured in one emotion dataset can be transferred to an...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Virtual Reality & Intelligent Hardware
سال: 2021
ISSN: ['2096-5796', '2666-1209']
DOI: https://doi.org/10.1016/j.vrih.2020.12.002